Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

CP-NAS: Child-Parent Neural Architecture Search for 1-bit CNNs

103

Algorithm 9 Child-Parent NAS

Input: Training data, Validation data

Parameter: Searching hyper-graph: G, K = 8, selection(o⁽^i,j⁾

) = 0 for all edges

Output: Optimal structure α

1: while (K > 1) do

for t = 1, ..., T epoch do

for e = 1, ..., K epoch do

Select an architecture by sampling (without replacement) one operation from

O⁽^i,j⁾for every edge;

Construct the Child model and Parent model with the same selected architecture,

and then train both models to get the accuracy on the validation data;

Use Eq.4.15 to compute the performance and assign that to all the sampled

operations;

end for

Update e(o⁽^i,j⁾

) using Eq. 4.28;

Reduce the search space {O⁽^i,j⁾} with the worst performance evaluation by e(o⁽^i,j⁾

) ;

10:

K = K −1;

11: end while

12: return solution

4.3.3

Search Strategy for CP-NAS

As shown in Fig. 4.4, we randomly sample one operation from the K operations in O⁽^i,j⁾

for every edge and then obtain the performance based on Eq. 4.15 by training the sampled

parent and child networks for one epoch. Finally, we assign this performance to all the

sampled operations. These steps are performed K times by sampling without replacement,

giving each operation exactly one accuracy for every edge for fairness.

We repeat the complete sampling process T times. Thus, each operation for every edge

has T performance {z⁽^i,j⁾

k,1 ^{, z}⁽^i,j⁾

k,2 ^{, ..., z}⁽^i,j⁾

k,T ^}^{calculated by Eq. 4.15. Furthermore, to reduce}

the undesired ﬂuctuation in the performance evaluation, we normalize the performance of

K operations for each edge to obtain the ﬁnal evaluation indicator as

e(o⁽^i,j⁾

) =

exp{¯z⁽^i,j⁾

}

k^′^exp^{^¯^z⁽^i,j⁾

k^′

}

(4.16)

where ¯z⁽^i,j⁾

= ¹

t ^z⁽^i,j⁾

k,t ^{. Along with increasing epochs, we progressively abandon the worst}

evaluation operation from each edge until there is only one operation for each edge.

4.3.4

Optimization of the 1-Bit CNNs

Inspired by XNOR and PCNN, we reformulate our uniﬁed framework’s binarized optimiza-

tion as Child-Parent optimization.

To binarize the weights and activations of CNNs, we introduce the kernel-level Child-

Parent loss for binarized optimization in two respects. First, we minimize the distribution

between the full-precision and corresponding binarized ﬁlters. Second, we minimize the

intraclass compactness based on the output features. We then have a loss function, as

L ˆ

H ⁼

c,l

MSE(H^l

c^,^ˆ^H^l

c^{) +}^λ

∥fC,s( ^ˆH) −f C,s(H)∥²,

(4.17)